{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# COMPSCI 389: Introduction to Machine Learning\n", "# Topic 4.0 Model Evaluation\n", "\n", "In this notebook we will consider ways of evaluating how effective supervised learning algorithms are.\n", "\n", "Let's start with the imports that we will use in this notebook:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from sklearn.neighbors import KDTree\n", "from sklearn.base import BaseEstimator\n", "import numpy as np\n", "\n", "# New this time:\n", "from sklearn.model_selection import train_test_split # For splitting into training and testing sets (more on this below!)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, let's load and display the GPA data set:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
physicsbiologyhistoryEnglishgeographyliteraturePortuguesemathchemistrygpa
0622.60491.56439.93707.64663.65557.09711.37731.31509.801.33333
1538.00490.58406.59529.05532.28447.23527.58379.14488.642.98333
2455.18440.00570.86417.54453.53425.87475.63476.11407.151.97333
3756.91679.62531.28583.63534.42521.40592.41783.76588.262.53333
4584.54649.84637.43609.06670.46515.38572.52581.25529.041.58667
.................................
43298519.55622.20660.90543.48643.05579.90584.80581.25573.922.76333
43299816.39851.95732.39621.63810.68666.79705.22781.01831.763.81667
43300798.75817.58731.98648.42751.30648.67662.05773.15835.253.75000
43301527.66443.82545.88624.18420.25676.80583.41395.46509.802.50000
43302512.56415.41517.36532.37592.30382.20538.35448.02496.393.16667
\n", "

43303 rows × 10 columns

\n", "
" ], "text/plain": [ " physics biology history English geography literature Portuguese \\\n", "0 622.60 491.56 439.93 707.64 663.65 557.09 711.37 \n", "1 538.00 490.58 406.59 529.05 532.28 447.23 527.58 \n", "2 455.18 440.00 570.86 417.54 453.53 425.87 475.63 \n", "3 756.91 679.62 531.28 583.63 534.42 521.40 592.41 \n", "4 584.54 649.84 637.43 609.06 670.46 515.38 572.52 \n", "... ... ... ... ... ... ... ... \n", "43298 519.55 622.20 660.90 543.48 643.05 579.90 584.80 \n", "43299 816.39 851.95 732.39 621.63 810.68 666.79 705.22 \n", "43300 798.75 817.58 731.98 648.42 751.30 648.67 662.05 \n", "43301 527.66 443.82 545.88 624.18 420.25 676.80 583.41 \n", "43302 512.56 415.41 517.36 532.37 592.30 382.20 538.35 \n", "\n", " math chemistry gpa \n", "0 731.31 509.80 1.33333 \n", "1 379.14 488.64 2.98333 \n", "2 476.11 407.15 1.97333 \n", "3 783.76 588.26 2.53333 \n", "4 581.25 529.04 1.58667 \n", "... ... ... ... \n", "43298 581.25 573.92 2.76333 \n", "43299 781.01 831.76 3.81667 \n", "43300 773.15 835.25 3.75000 \n", "43301 395.46 509.80 2.50000 \n", "43302 448.02 496.39 3.16667 \n", "\n", "[43303 rows x 10 columns]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Load the data set\n", "df = pd.read_csv(\"https://people.cs.umass.edu/~pthomas/courses/COMPSCI_389_Spring2024/GPA.csv\", delimiter=',') # Read GPA.csv, assuming numbers are separated by commas\n", "#df = pd.read_csv(\"data/GPA.csv\", delimiter=',')\n", "\n", "# Display the data set\n", "display(df)\n", "\n", "# Split into X (inputs) and y (labels)\n", "X = df.iloc[:, :-1]\n", "y = df.iloc[:, -1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Recall our nearest neighbor implementation:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "class NearestNeighbor(BaseEstimator):\n", " def fit(self, X, y):\n", " # Convert X and y to NumPy arrays if they are DataFrames. This makes fit compatible with numpy arrays or DataFrames\n", " if isinstance(X, pd.DataFrame):\n", " X = X.values\n", " if isinstance(y, pd.Series):\n", " y = y.values\n", "\n", " # Store the training data and labels.\n", " self.X_data = X\n", " self.y_data = y\n", " \n", " # Create a KDTree for efficient nearest neighbor search\n", " self.tree = KDTree(X)\n", "\n", " return self\n", "\n", " def predict(self, X):\n", " # Convert X to a NumPy array if it's a DataFrame\n", " if isinstance(X, pd.DataFrame):\n", " X = X.values\n", "\n", " # Query the tree for the nearest neighbors of all points in X\n", " dist, ind = self.tree.query(X, k=1) # ind will be a 2D array where ind[i,j] is the index of the j'th nearest point to the i'th row in X.\n", "\n", " # Extract the nearest labels\n", " return self.y_data[ind[:,0]] # ind[:,0] are the indices of the nearest neighbors to each query (each row in x))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluation\n", "\n", "Now that we have created our first ML algorithm, how we can we determine how effective it is?\n", "\n", "> **Idea**: Run the model on many data points and compute the average error.\n", "\n", "Let's do this:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Average Error: 0.0\n" ] } ], "source": [ "# Train the model on the data\n", "model = NearestNeighbor()\n", "model.fit(X, y)\n", "predictions = model.predict(X)\n", "\n", "# Compute the average error\n", "average_error = (predictions - y).mean()\n", "\n", "print(\"Average Error:\", average_error)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Illusion of Perfect Predictions\n", "\n", "We've seemingly achieved perfect predictions with our model! But let's pause and reflect.\n", "\n", "**Question**: Are our predictions genuinely perfect?\n", "\n", "**Answer**: Not quite. There's a fundamental problem with our approach: we evaluated our model's performance using the **same data** we used to train it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Why This Evaluation Is Misleading\n", "\n", "Evaluating a model on the training data answers the question:\n", "\n", "> How well does our model predict outcomes for data it has already seen?\n", "\n", "But the real question we want to answer is:\n", "\n", "> How well can our model predict outcomes for new, unseen data?\n", "\n", "This is not only a problem for the NN algorithm (although it is particularly clear in this case). This problem arises when you evaluate *any* ML algorithm using the same data (or some of the same data) that was used to train it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Train/Test Splits\n", "\n", "To accurately assess a model's performance, we need to test it on data that it hasn't seen during training. This is where **train/test splits** come into play.\n", "\n", "- **Training Set**: A subset of the data used to train the model. The model learns to make predictions based on this data.\n", "- **Testing Set**: A separate subset used to evaluate the model. This set is not used during training and thus provides an unbiased evaluation of the model's performance on new data.\n", "\n", "By splitting our data into these two sets, we can train our model on one portion and then test its predictions on another, unseen portion. This approach gives us a more realistic measure of how well our model will perform in real-world scenarios, where it encounters data it hasn't seen before.\n", "\n", "This raises the question: If we have `data_size` points (rows), how many should we use for training and how many for testing?\n", "\n", "- If we use too much for training, our evaluation will have high variance (it will not be reliable).\n", "- If we use too little for training, the models we learn will not perform well.\n", "\n", "Although there is some research studying how to split data into training and testing sets, the *vast* majority of the time people pick a split like 50/50, 60/40, 40/60, 80/20, 20/80, etc. based on their intuition about how much data their algorithm needs to produce a good model and how much data will be needed for evaluation. Let's use 80% of our data for training and 20% for testing.\n", "\n", "**Question**: If we take the first X% for training and the last (100-X)% for testing, what's something we should watch out for in real applications?\n", "\n", "**Answer**: Sometimes data sets are provided in some sort of order. For example, the student data could be sorted by GPA. We don't want to put all of the high-GPA points into training and the low-GPA points into testing, since that would also bias our evaluation. We therefore randomly select which points go into the training and testing sets. We will use the `train_test_split` function from scikit-learn which does this for us (when `shuffle=True`)." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
physicsbiologyhistoryEnglishgeographyliteraturePortuguesemathchemistry
28091424.58409.10630.61632.05535.21597.34599.84511.14435.55
5059646.34730.26625.33498.99582.52628.91559.91611.36771.44
37171505.06585.74573.60542.00566.37622.29375.83665.16638.87
37197399.97490.58461.35439.84479.78404.52422.41417.93488.64
30458676.03743.46653.00595.96584.78682.12577.30573.07651.62
..............................
7950480.19467.69519.39415.53555.47413.44409.56491.42462.23
33895576.49526.81561.79733.88627.47646.09649.64653.90613.73
1334389.58401.12308.93434.19451.13386.36484.20535.33394.41
16687404.59467.41647.27573.02573.54544.47617.50422.61473.46
39688465.34345.31451.07373.54497.07507.35390.95374.44484.54
\n", "

25981 rows × 9 columns

\n", "
" ], "text/plain": [ " physics biology history English geography literature Portuguese \\\n", "28091 424.58 409.10 630.61 632.05 535.21 597.34 599.84 \n", "5059 646.34 730.26 625.33 498.99 582.52 628.91 559.91 \n", "37171 505.06 585.74 573.60 542.00 566.37 622.29 375.83 \n", "37197 399.97 490.58 461.35 439.84 479.78 404.52 422.41 \n", "30458 676.03 743.46 653.00 595.96 584.78 682.12 577.30 \n", "... ... ... ... ... ... ... ... \n", "7950 480.19 467.69 519.39 415.53 555.47 413.44 409.56 \n", "33895 576.49 526.81 561.79 733.88 627.47 646.09 649.64 \n", "1334 389.58 401.12 308.93 434.19 451.13 386.36 484.20 \n", "16687 404.59 467.41 647.27 573.02 573.54 544.47 617.50 \n", "39688 465.34 345.31 451.07 373.54 497.07 507.35 390.95 \n", "\n", " math chemistry \n", "28091 511.14 435.55 \n", "5059 611.36 771.44 \n", "37171 665.16 638.87 \n", "37197 417.93 488.64 \n", "30458 573.07 651.62 \n", "... ... ... \n", "7950 491.42 462.23 \n", "33895 653.90 613.73 \n", "1334 535.33 394.41 \n", "16687 422.61 473.46 \n", "39688 374.44 484.54 \n", "\n", "[25981 rows x 9 columns]" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Average Error: -0.0034164710772428193\n" ] } ], "source": [ "# We already loaded X and y, but do it again as a reminder\n", "X = df.iloc[:, :-1]\n", "y = df.iloc[:, -1]\n", "\n", "# Split the data into training and testing sets (60% train, 40% test)\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, shuffle=True)\n", "\n", "# Display the training data.\n", "display(X_train)\n", "\n", "# Train the model on the training data\n", "model.fit(X_train, y_train)\n", "\n", "# Predict on the testing data\n", "predictions = model.predict(X_test)\n", "\n", "# Compute the average error on the testing data\n", "average_error = (predictions - y_test).mean()\n", "\n", "print(\"Average Error:\", average_error)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that the training data is a new DataFrame that maintains the indices from the original DataFrame. This makes it easier to look up corresponding values (e.g., in the label Series). Try setting `shuffle=false` in the previous python cell, and notice how it changes the indices. Turn shuffling back on (and re-run the cell again) before continuing." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluation Metrics\n", "\n", "Wow, these predictions are *really* good! Given the 9 entrance exam scores we can predict a new applicants GPA to within a couple *thousandths* of a GPA point!\n", "\n", "Lets look at some of these super-accurate predictions:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " label prediction difference\n", "0 1.333330 3.90333 2.57000\n", "1 2.983330 2.86000 -0.12333\n", "2 NaN 2.97333 NaN\n", "3 2.533330 3.41000 0.87667\n", "4 NaN 3.63667 NaN\n", "... ... ... ...\n", "43287 0.333333 NaN NaN\n", "43294 3.473330 NaN NaN\n", "43297 3.633330 NaN NaN\n", "43300 3.750000 NaN NaN\n", "43301 2.500000 NaN NaN\n", "\n", "[27654 rows x 3 columns]\n" ] } ], "source": [ "# The predictions are a numpy array. Convert them to a Series\n", "predictions_series = pd.Series(predictions, name='prediction')\n", "\n", "# Calculate the difference\n", "difference = predictions_series - y_test\n", "\n", "# Create a new DataFrame\n", "temp = pd.DataFrame({\n", " 'label': y_test,\n", " 'prediction': predictions_series,\n", " 'difference': difference\n", "})\n", "\n", "print(temp)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Wait, why are we getting NaN? Notice that this DataFrame has 43,300 rows, which is roughly the total number of data points. This should only be the length of the testing set, which is far smaller!\n", "\n", "What's happening is that `train_test_split` preserves the original indexing when producing `y_test`. So, although `y_test.size == 17322`, the indexes of `y_test` span from 0 to 43,302. This is useful in cases where you want to match up labels in `y_test` to their corresponding rows in the original data set. \n", "\n", "We can use `reset_index(drop=True)` [[link]](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.reset_index.html) to reset the indices in `y_test` to the default indexing of 0, 1, 2, ..." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " label prediction difference\n", "0 3.28333 3.903330 0.620000\n", "1 1.33333 2.860000 1.526670\n", "2 3.13333 2.973330 -0.160000\n", "3 3.87667 3.410000 -0.466670\n", "4 2.41667 3.636670 1.220000\n", "... ... ... ...\n", "17317 2.97333 0.066667 -2.906663\n", "17318 2.58667 2.563330 -0.023340\n", "17319 2.25667 2.363330 0.106660\n", "17320 3.73667 3.053330 -0.683340\n", "17321 3.83333 3.076670 -0.756660\n", "\n", "[17322 rows x 3 columns]\n" ] } ], "source": [ "# The predictions are a numpy array. Convert them to a Series\n", "predictions_series = pd.Series(predictions, name='prediction')\n", "y_test_series = pd.Series(y_test, name='label').reset_index(drop=True) # We reset the indices in y_test. drop=True means to discard the old indices. If False, it keeps the old index as a new column rather than discarding it.\n", "\n", "# Calculate the difference\n", "difference = predictions_series - y_test_series\n", "\n", "# Create a new DataFrame\n", "temp = pd.DataFrame({\n", " 'label': y_test_series,\n", " 'prediction': predictions_series,\n", " 'difference': difference\n", "})\n", "\n", "print(temp)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Something's wrong here! These aren't that accurate. Almost all are off by way more than a few thousandths of a GPA point. \n", "\n", "Before going on, note that we could obtain these values (in a different order) with the following. Here `predictions` is a numpy array, while `y_test` is a Series, so the result is a Series. The Series includes index information. Notice that the indices are not in order. The previous discussion should make it clear why this is." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "16950 0.620000\n", "35803 1.526670\n", "9713 -0.160000\n", "29230 -0.466670\n", "8629 1.220000\n", " ... \n", "5321 -2.906663\n", "39267 -0.023340\n", "15494 0.106660\n", "19800 -0.683340\n", "28249 -0.756660\n", "Name: gpa, Length: 17322, dtype: float64\n" ] } ], "source": [ "print(predictions - y_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question**: These errors seem bigger than expected based on our evaluation. What are we doing wrong?\n", "\n", "**Answer**: We are computing the mean error (or average error), which lets positive and negative errors cancel out! This measures whether we are on average over-predicting or under-predicting. We are (on average) under-predicting by a slight amount.\n", "\n", "There are several alternative metrics that can better quantify the accuracy of a model for a regression problem. We review four of the most common:\n", "\n", "#### Mean Squared Error (MSE)\n", "\n", "MSE measures the average of the squares of the errors. It gives a higher weight to larger errors, making it sensitive to outliers. It's useful when large errors are particularly undesirable.\n", "\n", "$$\\operatorname{MSE}=\\frac{1}{n}\\sum_{i=1}^n (y_i-\\hat y_i)^2,$$\n", "\n", "where $n$ is the size of the testing set, $y_i$ is the $i^\\text{th}$ label, and $\\hat y_i$ is the $i^\\text{th}$ prediction.\n", "\n", "#### Root Mean Squared Error (RMSE)\n", "\n", "RMSE is the square root of MSE. It has the same units as the target variable (the same scale), making it easier to interpret. Like MSE, it gives more weight to larger errors.\n", "\n", "$$\\operatorname{RMSE}=\\sqrt{\\frac{1}{n}\\sum_{i=1}^n (y_i-\\hat y_i)^2}.$$\n", "\n", "#### Mean Absolute Error (MAE)\n", "MAE measures the average magnitude of the errors in a set of predictions, without considering their sign. It's less sensitive to outliers compared to MSE and RMSE (this can be a good thing or a bad thing, depending on your application).\n", "\n", "$$\\operatorname{MAE}=\\frac{1}{n}\\sum_{i=1}^n \\left \\vert y_i - \\hat y_i \\right \\vert.$$\n", "\n", "#### R-squared ($R^2$)\n", "\n", "R-squared, or the *coefficient of determination*, indicates the proportion of the variance of the dependent variable (labels) that is predictable from the independent variables (predictions). Unlike the other metrics, a higher $R^2$ indicates a better fit.\n", "\n", "$$R^2=1-\\frac{\\sum_{i=1}^n (y_i-\\hat y_i)^2}{\\sum_{i=1}^n (y_i - \\bar y)^2},$$\n", "\n", "where $\\bar y = \\frac{1}{n}\\sum_{i=1}^n y_i$ is the average label. In this equation the numerator measures the unexplained variance by the model and the denominator measures the total variance in the actual labels.\n", "\n", "\n", "Let's create functions for computing these different metrics given an array or Series of predictions and labels." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def mean_squared_error(predictions, labels):\n", " return np.mean((predictions - labels) ** 2)\n", "\n", "def root_mean_squared_error(predictions, labels):\n", " return np.sqrt(mean_squared_error(predictions, labels))\n", "\n", "def mean_absolute_error(predictions, labels):\n", " return np.mean(np.abs(predictions - labels))\n", "\n", "def r_squared(predictions, labels):\n", " ss_res = np.sum((labels - predictions) ** 2) # ss_res is the \"Sum of Squares of Residuals\"\n", " ss_tot = np.sum((labels - np.mean(labels)) ** 2) # ss_tot is the \"Total Sum of Squares\"\n", " return 1 - (ss_res / ss_tot)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's use these functions to test how well our NN algorithm works on the GPA data set." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Average Error: -0.0034164710772428193\n", "Mean Squared Error: 1.141463152657873\n", "Root Mean Squared Error: 1.0683927895010679\n", "Mean Absolute Error: 0.8195677513104722\n", "R-squared: -0.6978132552064493\n" ] } ], "source": [ "# Compute the average error and other metrics on the testing data\n", "average_error = (predictions - y_test).mean()\n", "mse = mean_squared_error(predictions, y_test)\n", "rmse = root_mean_squared_error(predictions, y_test)\n", "mae = mean_absolute_error(predictions, y_test)\n", "r2 = r_squared(predictions, y_test)\n", "\n", "# Print the metrics\n", "print(\"Average Error:\", average_error)\n", "print(\"Mean Squared Error:\", mse)\n", "print(\"Root Mean Squared Error:\", rmse)\n", "print(\"Mean Absolute Error:\", mae)\n", "print(\"R-squared:\", r2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These give a much more clear picture of how accurate the model is. Some area easier to interpret than others, but all can be used to compare the performance of different ML methods." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.7" } }, "nbformat": 4, "nbformat_minor": 2 }